AMRflows Metagenomic Data Analysis Course – exploring-the-shell-environment

4. Exploring Your Environment

Navigating Files and Directories

Now that you’ve become acquainted with launching your command line interface and using shell, let’s delve into the practical aspects - managing your files and directories.

Navigating the Terrain

Our first task is to master navigation on the command line:

Type cd (short for “change directory”) followed by a directory name, such as cd Documents, and press Enter. We will go over how to move up the directory tree later in this section.
If you ever find yourself disoriented and need to return to your starting point, simply type cd without any additional arguments and press Enter. This will take you to your home directory.

Location Awareness

If you are not sure what directory you are in and what the directory tree is you can use this command:

Type pwd (short for “print working directory”) and press Enter. This will print your absolute path - more on that later.

File and Directory Management

Now let’s look at how to create and delete files and directories.

Creating Files and Directories

To create a new directory, type mkdir DirectoryName and press Enter. Replace DirectoryName with your desired directory name.
To generate a new text file, type touch FileName.txt and press Enter. Insert your chosen name after FileName. It’s akin to composing a new document.

Deleting Files and Directories

Let’s also learn how to delete files and directories:

To delete a file, type rm FileName.txt and press Enter. This is not reversable!
To delete an entire directory, type rm -r DirectoryName and press Enter. This is not reversable!

Copying and Relocating

Now, let’s learn how to move and copy files.

Copying Files and Directories

To duplicate a file, type cp SourceFile.txt DestinationDirectory/ and press Enter. Substitute SourceFile with the file to be duplicated and DestinationDirectory with the target directory. This will make a copy of SourceFile in the DestinationDirectory.

cp SourceFile.txt DestinationDirectory/

Moving Files and Directories

To move a file instead, type mv SourceFile.txt DestinationDirectory/ and press Enter.

mv SourceFile.txt DestinationDirectory/

Renaming Files and Directories

Occasionally, you may want to rename your files and directories:

To rename a file, we again use the mv command, except this this we give it put its new name after the file name instead of a destination directory.

mv OldFileName.txt NewFileName.txt

To rename a directory, we do the same thing as with a file, but give it a new directory name:

mv OldDirectoryName/ NewDirectoryName/

Viewing Files

In your research, you’ll often need to inspect the content of files, whether they contain DNA sequences, metadata, or analysis results. Let’s explore some commonly used commands for viewing the contents of files within your command-line environment.

Viewing Text Files

Text files are frequently used in bioinformatics for storing data and information. To view the contents of a text file, you can use the following commands:

cat: This command displays the entire contents of a text file directly in your terminal. For instance, to view the contents of a file called sequence.txt, you would use:

cat sequence.txt

less: This command allows you to view large text files one screen at a time, making it easier to navigate through lengthy documents. To use less, simply type:

less sequence.txt

You can navigate through the file using the arrow keys and exit by pressing q.

head and tail: These commands display the first or last few lines of a file, respectively. For example, to view the first 10 lines of a file:

head -n 10 sequence.txt

To view the last 20 lines of a file:

tail -n 20 sequence.txt

Searching text files.

If you want to search text files for a specific pattern or string we can use the bash command grep.

Lets assume we had a file called fruitprices.txt that looks like this:

Fruit   Price
Pineapple   40
Mango   60
Banana  20
Peach   25

If we wanted to check the price of mangos we would use:

grep "mango" fruitprices.txt

Which would return:

Mango 60

As we can see, grep returns the entire line that the search pattern is found in. By default it is not case-sensitive.

If fruitprices.txt looked like this instead:

Fruit   Price
Pineapple   40
Mango   60
Banana  20
Peach   25
Big mango   70

And we used the same command as above, it would return:

Mango 60
Big mango 70

Notice that grep returns every single line that contians the pattern/string we searched for. grep has a lot of different flags and options that changes its behaviour and make it a very powerful tool. Make sure to use --help to find out more.

Understanding Absolute and Relative Paths

When navigating the command line, it’s crucial to understand how to specify the location of files and directories accurately. This is where the concepts of absolute and relative paths come into play.

Absolute Paths

An absolute path provides the complete and unambiguous location of a file or directory in the file system, starting from the root directory. The root directory is the top level of the directory tree and contains all directories below it. It is represented by a single forward slash (/).

For example, if your home directory is /home/username (the directory you enter when you type cd) and you want to view a file called data.txt located in the Documents directory within your home directory, you would use the absolute path like this:

less /home/username/Documents/data.txt

Absolute paths are useful when you need to access files or directories from anywhere in the file system, regardless of your current working directory. They specify the precise location without any ambiguity.

Relative Paths

In contrast, relative paths specify the location of a file or directory relative to your current working directory. They are typically shorter and more convenient to use, as we often don’t want to type the absolute path when a file in many directories deep.

Here are some examples of relative paths:

To view the data.txt file in your current directory, you can simply use its name:

less data.txt

To access a file or directory in a subdirectory, you can use a path relative to your current directory. For instance, to view the config.txt file in a directory called settings located within your current directory:

less settings/config.txt

To move up one directory level, you can use ... For example, if you are in the settings directory and want to view a file in the parent directory:

less ../parentfile.txt

Relative paths are handy when you are working within a specific context, such as organizing files within a project directory.

These relative and absolute paths work with all commands not just ones to look at text files. For example, if you wanted to navigate around we use the cd command we introduced earlier.

We use pwd and find that we are in /home/username/Documents/amrflows/. pwd always prints out the absolute path. We want to move up a directory back into /home/username/Documents/.

We could move there using absolute paths:

cd /home/username/Documents/

or we could recognise that we only want to move up one directory and can use relative paths:

cd ../

See how much shorter the command is. This becomes important when our commands become more complicated.

Previous Commands

You can easily access previous commands you have executed by pressing the up arrow key. This will allow you to step back command-by-command and quickly rerun things or check what you’ve written.

Connection loss

Note: this is only relevant if you’re connecting to a server. You can skip this section if you are running your commands locally on your own computer.

If you’ve connected to a server through SSH protocol, and you lose connection through internet failure, terminal crashing, or just closing your laptop/computer, you’ll lose all progress on what you’re currently working on. For example, if you are in the middle of running a program that takes hours or days to complete, you will have to start from the beginning.

To avoid this, we can use a program called screen. Screen is a virtual command line within a command line which continues to function even when connection is lost. To start a screen we can use the command:

screen -S <screen_name>

You can exit a screen using:

screen -d

You can get a list of the screen instances running on your server with:

screen -ls

You can reattach to your screen of choice with screen -r followed by the name or number of the screen given by screen -ls.

Recap and Excerice

That was quite a bit to digest, but don’t fret if you can’t recall everything immediately. The most effective way to become proficient is through practice.

Exercise

Move into your home directory.
Create a new directory called “Metagenomics_Project.”
Move into the new directory
Generate a new text file named “SampleData.txt.”
Move the “SampleData.txt” file to your home directory (the starting location when you open your command line).
Rename the file to “MyData.txt.”
List the contents of your home directory to verify that everything is in its rightful place.

Example code for exercise

cd
mkdir Metagenomics_Project 
cd Metagenomics_Project 
touch SampleData.txt 
mv SampleData.txt ../ 
cd ../
mv SampleData.txt MyData.txt 
ls

Congratulations! You’ve just explored the fundamentals of managing files and directories using the command line. In the upcoming section, we’ll delve deeper into bioinformatic tools and how to install them.